AITopics | slide deck

Collaborating Authors

slide deck

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Advancing Academic Chatbots: Evaluation of Non Traditional Outputs

Favero, Nicole, Salute, Francesca, Hardt, Daniel

arXiv.org Artificial IntelligenceDec-2-2025

Most evaluations of large language models focus on standard tasks such as factual question answering or short summarization. This research expands that scope in two directions: first, by comparing two retrieval strategies, Graph RAG, structured knowledge-graph based, and Advanced RAG, hybrid keyword-semantic search, for QA; and second, by evaluating whether LLMs can generate high quality non-traditional academic outputs, specifically slide decks and podcast scripts. We implemented a prototype combining Meta's LLaMA 3 70B open weight and OpenAI's GPT 4o mini API based. QA performance was evaluated using both human ratings across eleven quality dimensions and large language model judges for scalable cross validation. GPT 4o mini with Advanced RAG produced the most accurate responses. Graph RAG offered limited improvements and led to more hallucinations, partly due to its structural complexity and manual setup. Slide and podcast generation was tested with document grounded retrieval. GPT 4o mini again performed best, though LLaMA 3 showed promise in narrative coherence. Human reviewers were crucial for detecting layout and stylistic flaws, highlighting the need for combined human LLM evaluation in assessing emerging academic outputs.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2512.00991

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding

Jin, Yiqiao, Kaur, Rachneet, Zeng, Zhen, Ganesh, Sumitra, Kumar, Srijan

arXiv.org Artificial IntelligenceNov-4-2025

Multi-page visual documents such as manuals, brochures, presentations, and posters convey key information through layout, colors, icons, and cross-slide references. While large language models (LLMs) offer opportunities in document understanding, current systems struggle with complex, multi-page visual documents, particularly in fine-grained reasoning over elements and pages. We introduce SlideAgent, a versatile agentic framework for understanding multi-modal, multi-page, and multi-layout documents, especially slide decks. SlideAgent employs specialized agents and decomposes reasoning into three specialized levels-global, page, and element-to construct a structured, query-agnostic representation that captures both overarching themes and detailed visual or textual cues. During inference, SlideAgent selectively activates specialized agents for multi-level reasoning and integrates their outputs into coherent, context-aware answers. Extensive experiments show that SlideAgent achieves significant improvement over both proprietary (+7.9 overall) and open-source models (+9.8 overall).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.26615

Country:

Europe > Sweden (0.04)
Europe > Denmark (0.04)

Genre: Research Report (0.40)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.93)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Culturally-Aware Conversations: A Framework & Benchmark for LLMs

Havaldar, Shreya, Rai, Sunny, Cho, Young-Min, Ungar, Lyle

arXiv.org Artificial IntelligenceOct-14-2025

Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style - a key element of cultural communication - is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today's top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.11563

Country:

Europe > Austria > Vienna (0.14)
Europe > Netherlands (0.05)
Asia > Japan (0.05)
(13 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Auto-Slides: An Interactive Multi-Agent System for Creating and Customizing Research Presentations

Yang, Yuheng, Jiang, Wenjia, Wang, Yang, Wang, Yiwei, Zhang, Chi

arXiv.org Artificial IntelligenceSep-18-2025

The rapid progress of large language models (LLMs) has opened new opportunities for education. While learners can interact with academic papers through LLM-powered dialogue, limitations still exist: absence of structured organization and high text reliance can impede systematic understanding and engagement with complex concepts. To address these challenges, we propose Auto-Slides, an LLM-driven system that converts research papers into pedagogically structured, multimodal slides (e.g., diagrams and tables). Drawing on cognitive science, it creates a presentation-oriented narrative and allows iterative refinement via an interactive editor, in order to match learners' knowledge level and goals. Auto-Slides further incorporates verification and knowledge retrieval mechanisms to ensure accuracy and contextual completeness. Through extensive user studies, Auto-Slides enhances learners' comprehension and engagement compared to conventional LLM-based reading. Our contributions lie in designing a multi-agent framework for transforming academic papers into pedagogically optimized slides and introducing interactive customization for personalized learning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.11062

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry: Education > Educational Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows

Gmeiner, Frederic, Marquardt, Nicolai, Bentley, Michael, Romat, Hugo, Pahud, Michel, Brown, David, Roseway, Asta, Martelaro, Nikolas, Holstein, Kenneth, Hinckley, Ken, Riche, Nathalie

arXiv.org Artificial IntelligenceFeb-25-2025

Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient flexibility of AI systems to support diverse creative workflows (workflow flexibility). Motivated by these challenges, we created IntentTagger: a system for slide creation based on the notion of Intent Tags - small, atomic conceptual units that encapsulate user intent - for exploring granular and non-linear micro-prompting interactions for Human-GenAI co-creation workflows. Our user study with 12 participants provides insights into the value of flexibly expressing intent across varying levels of ambiguity, meta-intent elicitation, and the benefits and challenges of intent tag-driven workflows. We conclude by discussing the broader implications of our findings and design considerations for GenAI-supported content creation workflows.

intent tag, participant, présentation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3706598.3713861

2502.18737

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(19 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.92)

Industry:

Media (0.67)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback

ChatBCG: Can AI Read Your Slide Deck?

Singh, Nikita, Balian, Rob, Martinelli, Lukas

arXiv.org Artificial IntelligenceJul-16-2024

With the advanced vision capabilities of GPT-4o and Gemini Flash, an important question arises regarding the accuracy of these functionalities in practical business applications. Our assumption was that multimodal models are good at reading and summarizing charts. When given an image of a slide deck, they do a good job of summarizing key insights from it, often including relevant data points. Existing research into this question has evaluated the efficacy of LLM's when parsing tables [3], concluding that the LLMs were highly sensitive to input prompts which drive performance. Other works also evaluate LLMs ability to reason and read mathematical graphs [2] and find that GPT models outperform alternatives. This paper aims to explore whether multimodal models perform well on a variant of this skill - answering straightforward questions that require the models to pick out a number from a slide deck.

mean absolute error, mean absolute percentage error, slide deck, (13 more...)

arXiv.org Artificial Intelligence

2407.12875

Country:

Europe > France (0.04)
Asia > Japan (0.04)
South America > Brazil (0.04)
(4 more...)

Genre:

Research Report (0.83)
Questionnaire & Opinion Survey (0.70)

Industry:

Banking & Finance (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Google's new AI video generator is more HR than Hollywood

EngadgetApr-9-2024, 12:00:34 GMT

For most of us, creating documents, spreadsheets and slide decks is an inescapable part of work life in 2024. What's not is creating videos. That's something Google would like to change. On Tuesday, the company announced Google Vids, a video creation app for work that the company says can make everyone a "great storyteller" using the power of AI. Vids uses Gemini, Google's latest AI model, to quickly create videos for the workplace.

google, new ai video generator, video, (11 more...)

Engadget

Country: North America > United States > Nevada > Clark County > Las Vegas (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.79)
Information Technology > Artificial Intelligence > Vision (0.63)

Add feedback

Here's How AI Will Come for Your Job

The Atlantic - TechnologyMay-17-2023, 20:13:30 GMT

Abandon all hope, ye who merge spreadsheet cells! Last week, at its annual I/O conference, Google spent hours detailing how large language models would help the knowledge workers of the world unload their busywork onto a legion of eager, capable neural networks. The company will soon introduce AI functions into programs such as Gmail, Google Sheets, and Google Slides that will allow users to type simple commands and receive complex outputs: entire email compositions, for example, or auto-generated tables. The future that Google is promising feels familiar--it's all about heightened convenience and one-click efficiency--and I hate it. Workplace AI feels like the purest distillation of a corrosive ideology that demands frictionless productivity from workers: The easier our labor becomes, the more of it we can do, and the more of it we'll be expected to do.

intelligence, productivity tool, video, (16 more...)

The Atlantic - Technology

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Tanaka, Ryota, Nishida, Kyosuke, Nishida, Kosuke, Hasegawa, Taku, Saito, Itsumi, Saito, Kuniko

arXiv.org Artificial IntelligenceJan-12-2023

Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems, most of the existing datasets focus on understanding the content relationships within a single image and not across multiple images. In this study, we propose a new multi-image document VQA dataset, SlideVQA, containing 2.6k+ slide decks composed of 52k+ slide images and 14.5k questions about a slide deck. SlideVQA requires complex reasoning, including single-hop, multi-hop, and numerical reasoning, and also provides annotated arithmetic expressions of numerical answers for enhancing the ability of numerical reasoning. Moreover, we developed a new end-to-end document VQA model that treats evidence selection and question answering in a unified sequence-to-sequence format. Experiments on SlideVQA show that our model outperformed existing state-of-the-art QA models, but that it still has a large gap behind human performance. We believe that our dataset will facilitate research on document VQA.

artificial intelligence, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2301.04883

Country:

South America > Brazil (0.04)
Oceania (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

New DALL-E integration adds generative AI for next-level slides

#artificialintelligenceNov-16-2022, 15:35:27 GMT

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. For Tome, which calls itself the "new storytelling format for work and important ideas," integrating OpenAI's DALL-E into its flexible, interactive slide options -- which it announced today -- was a natural fit to add a generative AI dimension to decks. When OpenAI announced the release of the DALL-E API in early November, the San-Francisco-based startup had its chance. "Making that a part of the storytelling creation experience just felt really natural," Tome CEO Keith Peiris told VentureBeat. "It felt so much more powerful than looking for a stock photo or clip art -- it's kind of giving us a first look at what generative storytelling can look like."

next-level slide, peiris, tome, (9 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.26)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback